33 research outputs found
Disjoint pattern matching and implication in strings
We deal with the problem of whether a set of string patterns implies the presence of a fixed pattern. While checking whether a set of patterns occurs in a string is solvable in polynomial time, this implication problem is well-known to be intractable. Here we consider a version of the problem when patterns in the set are required to be disjoint. We show that for such a version of the problem the situation is reversed: checking whether a set of patterns occurs in a string is NP-complete, but the implication problem is solvable in polynomial time. 1 Introduction and the main result The problem we consider in this note was motivated by answering queries in incompletely specified XML documents. Suppose that L is a set of letters, or labels, assumed to be countably infinite, and that is a special symbol (wildcard) not in L. By L we denote LâȘ {}. A pattern is a finite string over L. If a string s over L matches a pattern Ï, we write s | = Ï. More precisely, if s = a0...anâ1 an
Representing and Querying Incomplete Information: a Data Interoperability Perspective
This habilitation thesis presents some of my most recent work, which has been done in collaboration with several other people. In particular this thesis concentrates on our contributions to the study of incomplete information in the context of data interoperability. In this scenario data is heterogenous and decentralized, needs to be integrated from several sources and exchanged between different applications. Incompleteness, i.e. the presence of âmissingâ or âunknownâ portions of data, is naturally generated in data exchange and integration, due to data heterogeneity. The management of incomplete information poses new challenges in this context.The focus of our study is the development of models of incomplete information suitable to data interoperability tasks, and the study of techniques for efficiently querying several forms of incompleteness
When is naive evaluation possible?
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. Lâarchive ouverte pluridisciplinaire HAL, est destineÌe au deÌpoÌt et a Ì la diffusion de documents scientifiques de niveau recherche, publieÌs ou non, eÌmanant des eÌtablissements dâenseignement et de recherche français ou eÌtrangers, des laboratoires publics ou priveÌs
Querying Incomplete Data : Complexity and Tractability via Datalog and First-Order Rewritings
To answer database queries over incomplete data the gold standard is finding
certain answers: those that are true regardless of how incomplete data is
interpreted. Such answers can be found efficiently for conjunctive queries and
their unions, even in the presence of constraints. With negation added, the
problem becomes intractable however. We concentrate on the complexity of
certain answers under constraints, and on effficiently answering queries
outside the usual classes of (unions) of conjunctive queries by means of
rewriting as Datalog and first-order queries. We first notice that there are
three different ways in which query answering can be cast as a decision
problem. We complete the existing picture and provide precise complexity bounds
on all versions of the decision problem, for certain and best answers. We then
study a well-behaved class of queries that extends unions of conjunctive
queries with a mild form of negation. We show that for them, certain answers
can be expressed in Datalog with negation, even in the presence of functional
dependencies, thus making them tractable in data complexity. We show that in
general Datalog cannot be replaced by first-order logic, but without
constraints such a rewriting can be done in first-order. The paper is under
consideration in Theory and Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
A Simple Algorithm for Consistent Query Answering under Primary Keys
We consider the dichotomy conjecture for consistent query answering under
primary key constraints stating that for every fixed Boolean conjunctive query
q, testing whether it is certain over all repairs of a given inconsistent
database is either polynomial time or coNP-complete. This conjecture has been
verified for self-join-free and path queries. We propose a simple inflationary
fixpoint algorithm for consistent query answering which, for a given database,
naively computes a set of subsets of database repairs with at most
facts, where is the size of the query . The algorithm runs in polynomial
time and can be formally defined as: 1. Initialize with all sets
of at most facts such that satisfies . 2. Add any set of at most
facts to if there exists a block (ie, a maximal set of facts
sharing the same key) such that for every fact of there is a set contained in . The algorithm answers " is
certain" iff eventually contains the empty set. The algorithm
correctly computes certain answers when the query falls in the polynomial
time cases for self-join-free queries and path queries. For arbitrary queries,
the algorithm is an under-approximation: The query is guaranteed to be certain
if the algorithm claims so. However, there are polynomial time certain queries
(with self-joins) which are not identified as such by the algorithm
Certain Answers of Extensions of Conjunctive Queries by Datalog and First-Order Rewriting
International audienc
Reasoning about XML with temporal logics and automata
We show that problems arising in static analysis of XML specifications and transformations can be dealt with using techniques similar to those developed for static analysis of programs. Many properties of interest in the XML context are related to navigation, and can be formulated in temporal logics for trees. We choose a logic that admits a simple single-exponential translation into unranked tree automata, in the spirit of the classical LTL-to-BĂŒchi automata translation. Automata arising from this translation have a number of additional properties; in particular, they are convenient for reasoning about unary node-selecting queries, which are important in the XML context. We give two applications of such reasoning: one deals with a classical XML problem of reasoning about navigation in the presence of schemas, and the other relates to verifying security properties of XML views
Reasoning About Pattern-Based XML Queries
Abstract. We survey results about static analysis of pattern-based queries over XML documents. These queries are analogs of conjunctive queries, their unions and Boolean combinations, in which tree patterns play the role of atomic formulae. As in the relational case, they can be viewed as both queries and incomplete documents, and thus static analysis problems can also be viewed as finding certain answers of queries over such documents. We look at satisfiability of patterns under schemas, containment of queries for various features of XML used in queries, finding certain answers, and applications of pattern-based queries in reasoning about schema mappings for data exchange.
Datalog Rewritings of Regular Path Queries using Views
We consider query answering using views on graph databases, i.e. databases
structured as edge-labeled graphs. We mainly consider views and queries
specified by Regular Path Queries (RPQ). These are queries selecting pairs of
nodes in a graph database that are connected via a path whose sequence of edge
labels belongs to some regular language. We say that a view V determines a
query Q if for all graph databases D, the view image V(D) always contains
enough information to answer Q on D. In other words, there is a well defined
function from V(D) to Q(D). Our main result shows that when this function is
monotone, there exists a rewriting of Q as a Datalog query over the view
instance V(D). In particular the rewriting query can be evaluated in time
polynomial in the size of V(D). Moreover this implies that it is decidable
whether an RPQ query can be rewritten in Datalog using RPQ views